Improved prediction of bacterial transcription start sites

نویسندگان

  • James J. Gordon
  • Michael W. Towsey
  • James M. Hogan
  • Sarah A. Mathews
  • Peter Timms
چکیده

MOTIVATION Identifying bacterial promoters is an important step towards understanding gene regulation. In this paper, we address the problem of predicting the location of promoters and their transcription start sites (TSSs) in Escherichia coli. The accepted method for this problem is to use position weight matrices (PWMs), which define conserved motifs at the sigma-factor binding site. However this method is known to result in large numbers of false positive predictions. RESULTS Our approaches to TSS prediction are based upon an ensemble of support vector machines (SVMs) employing a variant of the mismatch string kernel. This classifier is subsequently combined with a PWM and a model based on distribution of distances from TSS to gene start. We investigate the effect of different scoring techniques and quantify performance using area under a detection-error tradeoff curve. When tested on a biologically realistic task, our method provides performance comparable with or superior to the best reported for this task. False positives are significantly reduced, an improvement of great significance to biologists. AVAILABILITY The trained ensemble-SVM model with instructions on usage can be downloaded from http://eresearch.fit.qut.edu.au/downloads

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PREDetector: a new tool to identify regulatory elements in bacterial genomes.

In the post-genomic area, the prediction of transcription factor regulons by position weight matrix-based programmes is a powerful approach to decipher biological pathways and to modelize regulatory networks in bacteria. The main difficulty once a regulon prediction is available is to estimate its reliability prior to start expensive experimental validations and therefore trying to find a way h...

متن کامل

SVM Based Prediction of Bacterial Transcription Start Sites

Identifying bacterial promoters is the key to understanding gene expression. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). Knowing the TSS position, one can predict promoter positions to within a few base pairs, and vice versa. As a route to promoter identification, we formally address the problem of TSS prediction, drawing on the RegulonDB datab...

متن کامل

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-...

متن کامل

The Prediction of Bacterial Transcription Start Sites Using Svms

Identifying promoters is the key to understanding gene expression in bacteria. Promoters lie in tightly constrained positions relative to the transcription start site (TSS). In this paper, we address the problem of predicting transcription start sites in Escherichia coli. Knowing the TSS position, one can then predict the promoter position to within a few base pairs, and vice versa. The accepte...

متن کامل

Mycobacterium avium subsp. paratuberculosis induces differential cytosine methylation at miR-21 transcription start site region

Mycobacterium aviumsubspecies paratuberculosis (MAP), as an obligate intracellular bacterium, causes paratuberculosis (Johne’s disease) in ruminants. Plus, MAP has consistently been isolated from Crohn’s disease (CD) lesions in humans; a notion implying possible direct causative ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 22 2  شماره 

صفحات  -

تاریخ انتشار 2006